Some Tests Of An Unsupervised Model Of Language Acquisition
نویسندگان
چکیده
We outline an unsupervised language acquisition algorithm and offer some psycholinguistic support for a model based on it. Our approach resembles the Construction Grammar in its general philosophy, and the Tree Adjoining Grammar in its computational characteristics. The model is trained on a corpus of transcribed child-directed speech (CHILDES). The model’s ability to process novel inputs makes it capable of taking various standard tests of English that rely on forced-choice judgment and on magnitude estimation of linguistic acceptability. We report encouraging results from several such tests, and discuss the limitations revealed by other tests in our present method of dealing with novel stimuli. 1 The empirical problem of language acquisition The largely unsupervised, amazingly fast and almost invariably successful learning stint that is language acquisition by children has long been the envy of computer scientists (Bod, 1998; Clark, 2001; Roberts and Atwell, 2002) and a daunting enigma for linguists (Chomsky, 1986; Elman et al., 1996). Computational models of language acquisition or “ grammar induction” are usually divided into two categories, depending on whether they subscribe to the classical generative theory of syntax, or invoke “ general-purpose” statistical learning mechanisms. We believe that polarization between classical and statistical approaches to syntax hampers the integration of the stronger aspects of each method into a common powerful framework. On the one hand, the statistical approach is geared to take advantage of the considerable progress made to date in the areas of distributed representation and probabilistic learning, yet generic “ connectionist” architectures are ill-suited to the abstraction and processing of symbolic information. On the other hand, classical rule-based systems excel in just those tasks, yet are brittle and difficult to train. We are developing an approach to the acquisition of distributional information from raw input (e.g., transcribed speech corpora) that also supports the distillation of structural regularities comparable to those captured by Context Sensitive Grammars out of the accrued statistical knowledge. In thinking about such regularities, we adopt Langacker’s notion of grammar as “ simply an inventory of linguistic units” ((Langacker, 1987), p.63). To detect potentially useful units, we identify and process partially redundant sentences that share the same word sequences. We note that the detection of paradigmatic variation within a slot in a set of otherwise identical aligned sequences (syntagms) is the basis for the classical distributional theory of language (Harris, 1954), as well as for some modern work (van Zaanen, 2000). Likewise, the pattern — the syntagm and the equivalence class of complementary-distribution symbols that may appear in its open slot — is the main representational building block of our system, ADIOS (for Automatic DIstillation Of Structure). Our goal in the present short paper is to illustrate some of the capabilities of the representations learned by our method vis a vis standard tests used by developmental psychologists, by secondlanguage instructors, and by linguists. Thus, the main computational principles behind the ADIOS model are outlined here only briefl y. The algorithmic details of our approach and accounts of its learning from CHILDES corpora appear elsewhere (Solan et al., 2003a; Solan et al., 2003b; Solan et al., 2004; Edelman et al., 2004). 2 The principles behind the ADIOS algorithm The representational power of ADIOS and its capacity for unsupervised learning rest on three principles: (1) probabilistic inference of pattern significance, (2) context-sensitive generalization, and (3) recursive construction of complex patterns. Each of these is described briefl y below. 78 P84 that P58 P63 E63 E64 P48 E64 Beth | Cindy | George | Jim | Joe | Pam | P49 | P51 P48 , doesn't it P51 the E50 P49 a E50 E50 bird | cat | cow | dog | horse | rabbit P61 who E62 E62 adores | loves | scolds | worships E53 Beth | Cindy | George | Jim | Joe | Pam E85 annoyes | bothers | disturbes | worries P58 E60 E64 E60 flies | jumps | laughs th a t B e th
منابع مشابه
Audiovisual Programs As Sources Of Language Input: An Overview
Audiovisual devices such as satellite and conventional televisions can offer easy access to authentic programs which are considered to be a rich source of language input for SLA (Second Language Acquisition). The immediacy of various audiovisual programs ensures that language learners’ exposure is up-to-date and embedded in the real world of native speakers. In the same line, in the present pap...
متن کاملAudiovisual Programs As Sources Of Language Input: An Overview
Audiovisual devices such as satellite and conventional televisions can offer easy access to authentic programs which are considered to be a rich source of language input for SLA (Second Language Acquisition). The immediacy of various audiovisual programs ensures that language learners’ exposure is up-to-date and embedded in the real world of native speakers. In the same line, in the present pap...
متن کاملLanguage development and acquisition in children
Language acquisition is a natural developmental process and is unique to Homo sapiens in which a child acquiring his or her mother tongue as a first language. The simplest theory of language development is that children learn language by imitating adult language. A second possibility is that children acquire language through conditioning. Noam Chomsky put forward innateness hypothesis. Piaget ...
متن کاملThe role of negotiation and TA in Iranians’ second language acquisition
In this study, it is attempted to survey some intervening factors leading L2 Iranian learners’ not to be successful as well, and then seeks some of the features that might be applicable to open new windows into L2 learners in Iran. Also it concerns some aspects of language learning, which have received poor attention from both pedagogical and non-pedagogical areas. This article examined some so...
متن کاملEvaluation of Iranian electronic products manufacturing industries using an unsupervised model, ARAS, SAW, and DEA models
متن کامل
Level of Grammatical Proficiency and Acquisition of Functional Projections: The case of Iranian learners of English language
Unlike Lexical Projections, Functional Projections (Extended Projections) are more of an ‘abstract’ in nature. Therefore, Functional Projections seem to be acquired later than Lexical Projections by the L2 learners. The present study investigates Iranian L2 learners’ acquisition of English Extended Projections taking into account their level of grammatical proficiency. Specifically, the aim is ...
متن کامل